Optimum detection for extracting maximum information from symmetric qubit sets 



o 
o 

(N 

a; 

(N 
> 

O 



^— > ■ 
C ■ 

=3 : 



X 



Jun Mizuno, 1, 2 Mikio Fujiwara, 1, 2 Makoto Akiba, 1 Tetsuya Kawanishi, 1 Stephen M. Barnett, 3 and Masahide Sasaki 1, 2 

1 Communications Research Laboratory, Koganei, Tokyo 184-8795, Japan 
2 CREST, Japan Science and Technology Corporation 
3 Department of Physics and Applied Physics, University of Strathclyde, Glasgow G4 ONG, Scotland 

(Dated: September 5, 2001) 

We demonstrate a class of optimum detection strategies for extracting the maximum information 
from sets of equiprobable real symmetric qubit states of a single photon. These optimum strategies 
have been predicted by Sasaki et al. J24| . The peculiar aspect is that the detections with at least three 
outputs suffice for optimum extraction of information regardless of the number of signal elements. 
The cases of ternary (or trine), quinary, and septenary polarization signals are studied where a 
standard von Neumann detection (a projection onto a binary orthogonal basis) fails to access the 
maximum information. Our experiments demonstrate that it is possible with present technologies 
to attain about 96 % of the theoretical limit. 

PACS numbers: 03.67.Hk, 03.65.Ta, 42.50.-p 



I. INTRODUCTION 

In communications systems a sender, Alice, represents 
messages, for example the alphabet {a, b, ...,z}, by a 
given set of letters {xt} such as {0, 1}. She transmits se- 
quences of the letters, in the form of codewords, through 
a communication channel. A receiver, Bob, detects code- 
words and thereby retrieves the message. To design an 
optimum communication system, one should first know 
basic properties of distinguishing the letter set {x{\ over 
a channel. These are specified by conditional probabili- 
ties P{yj\xi) that Bob finds the outcome yj when Alice 
selected the letter x\. The matrix of these conditional 
probabilities, [P(yj\xi)], is called the channel matrix. All 
the physical properties of the channel and of the detector 
are modeled through this channel matrix. 

When a communication system operates in quanta, the 
channel matrix will be determined by the rules of quan- 
tum mechanics. The physical carrier conveying a letter Xi 
should explicitly be described by a quantum state 
which we refer to as the letter state. For example, if 
the letters {0, 1} are conveyed by weak pulses of laser 
light, the corresponding quantum states { \ipi}} are usu- 
ally nonorthogonal coherent states. Such nonorthogonal 
states can never be distinguished perfectly, even in prin- 
ciple. Therefore even if a channel and a detector are 
completely noiseless, quantum mechanics imposes an in- 
evitable source of error or ambiguity in signal detection. 

A detection process is represented mathematically by 
the probability operator measure (POM), which consists 
of nonnegative (generally not normalized) Hermitian op- 
erators satisfying the resolution of the identity jl|, ^, |j| : 



n) = n„ 



n,>o 



(i) 



Each element IX, is associated with the measurement 
outcome j and hence implies the output letter yj. If 
a channel is noiseless and hence quantum limited, then 



the channel matrix is given by 

p {Vj\ x i) = (V'il n i l^i 



(2) 



The primary concern in quantum communication is to 
determine the optimum detection strategy {hf,} to dis- 
tinguish among the letter states { Each state \4>i) 
encodes the classical information embodied in the classi- 
cal letter Xi, which is selected with known prior proba- 
bility {P{ Xl )}. 

The meaning of 'optimum' depends on a task that 
we are going to do. The simplest requirement is that 
Bob wants to decide which letter state he has received 
among the set { \ipi)} with the smallest error. This usu- 
ally means minimizing the average error probability, or 
bit error rate P c j|, [|] . A second possibility is for Bob to 
eliminate all errors by allowing the possibility of incon- 
clusive results by means of unambiguous state discrimi- 
nation Ig, §, |, H, [D§. The optimum strategy 
in this case will be the one that minimizes the probabil- 
ity Pi of inconclusive outcomes. This type of detection 
has been proposed for quantum key distribution fl4| . 

For the communication of messages, however, Bob does 
best by devising a detection strategy so as to retrieve Al- 
ice's message with the greatest probability. This does not 
necessarily mean minimizing either P c or Pi, but instead 
means reducing the uncertainty in some random variable 
X = {xi, P(xi)}. Such a detection strategy is directly 
related to reliable communication by coding technique 
and is actually used as a basic building block for effec- 
tive decoding procedures of codeword states formed from 
the letter states { | */'*)}• (A more detailed explanation of 
this point is given in Appendix.) 

The reduction of the uncertainty caused by a detec- 
tion is quantified by the Shannon mutual information 
I(X : Y) between the input (Alice's) and output (Bob's) 
random variables, X — {xi, P(xi)} and Y = {yj, P(l/j)}- 
This mutual information I(X : Y) can be regarded as the 
amount of information extracted from X. Bob's optimum 
strategy will be the one that maximizes I(X : Y). Other 
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figures of merit have also been considered and these in- 
clude the fidelity 

The optimum conditions are already known for min- 
imizing the error probability |Q, |5j. ft is not an easy 
task, however, to find the optimum detection strategies 
from these conditions. In fact, optimum strategies are 
only known for some special cases such as the set of bi- 
nary states, sets of symmetric states j|, |] [l?], [l8|, [l^] 
and multiply symmetric states |20|| . Unambiguous state 
discrimination is possible if and only if the letter states 
are linearly independent and an explicit method for con- 
structing the optimum strategy has been given in this 
case Finding optimum solutions for I(X:Y) 

is much more difficult than those for P e and p due to 
the nonlinearity of logarithmic function of I(X : Y) with 
respect to a POM. Optimum solutions are known only 
for the set of binary pure states ]2l], and for sets of 
real symmetric qubit states with equal prior probabili- 
ties ||, |||. 

It seems intuitively reasonable that we might obtain 
most information by minimizing either the average error 
probability P e or the probability of inconclusive outcomes 
Pi. In fact, the maximum mutual information for binary 
states is attained by the same strategy that realizes the 
minimum average error probability. There are, however, 
cases where the maximum information must be obtained 
neither by minimizing P e nor p p3| , pi] , p5| . 

Devices capable of demonstrating near optimum detec- 
tion at the single photon level have been demonstrated 
in the laboratory. The simplest of these is discrimination 
between the set of binary photon polarization states with 
the minimum allowed average error probability p(| . Un- 
ambiguous discrimination between two non-orthogonal 
polarization states has also been demonstrated [^7], p8| . 
A set of more than three polarization states is linearly 
dependent and hence it is not possible to carry out unam- 
biguous state discrimination. Clarke et al. have demon- 
strated state discrimination with near minimum error 
probability for both the trine and tetrad polarization 
states [^9|. They have also demonstrated the ability to 
extract more information than is possible by the best, 
standard von Neumann measurement (a projection onto 
binary orthogonal polarization states). 

In this paper we describe our experimental implemen- 
tation of a class of optimum strategies for maximizing 
the mutual information, as predicted by Ref . |24| . One 
of these is the ternary or trine set of states discussed 
by Clarke et al. p9| . We have improved upon the in- 
formation yield obtained by these authors and have also 
measured the information obtained from signals formed 
from five and seven possible polarizations. Our letter 
states are implemented physically as single photon po- 
larizations. The required equiprobable real symmetric 
qubit states are then states of linear polarization. Such 
sets of states have previously found application in quan- 
tum key distribution |30[ From the view point of 
fundamental interests, they might be the simplest sys- 
tem with which to test the peculiar effect predicted by 




FIG. 1: The measurement state vectors for the optimum 
strategy (solid line) and the signal state vectors in the case of 
the ternary (trine) signals. 

Davies' theorem. According to the theorem, there must 
exist at least one solution, that maximizes the mutual 
information, which has N possible outputs, where N 
is bounded by d < N < d 2 with d being the dimen- 
sion of the Hilbert space H. s supported by Alice's set 
p3[ . For real state sets, this bounding inequality be- 
comes d< N < d(d+ 1)/2 [|||. Thus for a single photon 
polarization system, one can always optimize the mutual 
information by constructing a device with just three pos- 
sible outputs. This is true regardless of the number of 
letter states. In the case of ternary or trine signals, the 
optimum measurement consists of three symmetric state 
vectors with the length less than the unity, and has been 
demonstrated experimentally in Ref. [ p9| . In the cases 
of quinary and septenary signals, the optimum strategies 
consist of three nonorthogonal state vectors with differ- 
ent lengths. In the septenary case, there are two different 
configurations of measurement state vectors. We study 
how each of these strategies work and the extent to which 
they allow us to access the theoretical maximum amount 
of mutual information. 



II. REAL SYMMETRIC QUBIT SETS AND 
OPTIMUM DETECTION 

Let { 1 1)} be the orthogonal basis of linear polar- 
ization states of a single photon. Then the real symmetric 
qubit states are defined as 

|Vi> = cosgM+Bin^U) (3) 
(i = 0,...,M-l). 

We assume that each state is selected with equal prior 
probability l/M. This set is one of the few quantum 
state sets for which optimum strategies for the accessible 
information are explicitly known Ell |2^. |2^, |24|] . 

For M > 2, the signal states cannot be distinguished 
perfectly, thus p = 1 . The minimum average error prob- 
ability is 
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M 





0.5 


0.5 




0.5 





0.5 


|W2> 


0.5 


0.5 






TABLE I: The channel matrix of the optimum POM for the 
ternary signals. 



which is attained by the POM {IP,-} 
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(5) 
(6) 



(j = 0, . 



This POM is unique in leading to the minimum error 
probability and has the same number of POM elements, 
corresponding to the measurement outcomes, as the let- 
ter states. 

In contrast, maximizing the mutual information re- 
quires a POM with three rank-one elements at most, 
corresponding to just three measurement outcomes [ p4[ . 
Although it is also possible to construct optimum POMs 
with elements more than three, a strategy with minimum 
outputs is often the one desired in practice. 

If M is even, a von Neumann measurement, i.e. a pair 
of orthogonal projectors, can be the optimum strategy 
with minimum outputs. If M is odd, then at least three 
outputs are required and a standard von Neumann mea- 
surement fails in maximizing the mutual information. 
The three rank-one elements required for the optimum 
POM {IF,} are specified as follows: 



n, 



with 



= — sin 



in: 



K> = ^(-H + cos ||t» 
-cosi||» 



(7) 

(8) 



\U>2) 



where 7 is determined from 



7 m7r 7 

cos — = cot , sin — 

2 M 2 



1 - cot 2 (9 



for an integer parameter to within the range ^ < to < 
4p We will refer to the unnormalized vectors given in 
Eq. (H) as measurement state- vectors. 

In the case of M = 3 (ternary or trine), the optimum 
POM is given by to = 1 which results in the set of three 
measurement state- vectors with equal norms. The signal 
and measurement state-vectors are schematically shown 
in Fig.[l]. In this figure, each arrow represents the po- 
larization direction where the horizontal and the vertical 
directions correspond to the two unit bases |<->) and 1 1 ), 
respectively. The length of each arrow represents the 
norm of the associated state vector, e.g. \ipi) or \uij). 



FIG. 2: The measurement state vectors for the optimum 
strategy (solid line) and the signal state vectors in the case of 
the quinary signals (M — 5). 
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0.191 
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TABLE II: The channel matrix of the optimum POM for the 
quinary signals. 



The optimum measurement in this case means that the 
state vectors \ipj) and are orthogonal, and thus 



P(tt\*i) = (VilHj-lVi) =0. 



(10) 



The other two possible measurement outcomes occur 
with equal probabilities. This situation is summarized 
in Table | 

In the cases of M = 5 (quinary) and M — 7 (septe- 
nary), Eq. (@) results in the three measurement state- 
vector with two distinct norms. The relationship be- 
tween the quinary letter states and the three measure- 
ment state-vectors (with to = 2) is depicted in Fi 
The channel matrix in this case is summarized in Table 
In the septenary case, there are two different POMs with 
three elements given by Eq. (||), with to = 2 and m = 3 
in Eq. (p|) respectively. They are depicted in Fig.|| and 



summarized in Tables III and IV . In either case, there are 
combinations of that give P(yj\xi) = 0, although j 
is not necessarily equal to i (a difference from the ternary 
case). 

The method to implement the optimum POM with 
minimum outputs, as given in Eq. (Q), is prescribed in 
detail in Ref. (24|. In short, the nonorthogonal mea- 
surement basis { \u)j)} is considered as the projection 
of a three-dimensional orthonormal basis in an enlarged 
space. Such an enlarged space is achieved by introducing 
another independent binary basis. 

In practice, the concept described above is realized 
as the polarization Mach-Zehnder interferometer shown 
in Fig.y. The four-dimensional space is composed of 
{ l^o) 1 1 )o> Mh' 1 1 )J> wnere subscripts represent the 
optical paths (a, b) indicated in Fig. [I| Our letter states 
present in the subspace spanned by the first two of these 



FIG. 3: The two optimum strategies in the case of the septe- 
nary signals (M = 7). The left figure corresponds to the 
choice m = 2, while the right one corresponds to the other 
choice m = 3. 
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0.154 
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0.777 


0.777 
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0.154 
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TABLE III: The channel matrix of the optimum POM with 
m=2 (see Eq. (bl) ) for the septenary signals. 
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0.178 


0.579 


0.901 


0.901 


0.579 


0.178 


Ml) 


0.5 


0.322 
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|w 2 ) 


0.5 


0.5 


0.322 


0.099 





0.099 


0.322 



TABLE IV: The channel matrix of the optimum POM with 
m=3 (see Eq. (M ) for the septenary signals. 



FIG. 4: Principle of the detector that realizes the optimum 
POM. Here PBS stands for a polarizing beam splitter, HWP 
for a half waveplate whose axis is rotated by 8, and PD for a 
photodetector. 



mixed at HWP2 and PBS3, resulting in amplitudes of 

(13) 



— (A>n±A! Y ) — 



Aft ± COS — Ay 



which are then detected at PD1 and PD2. By inspecting 
Eqs. dl~2] ) and (13), it can be seen that \ojj) given in Eq. (||) 
were reproduced. When the condition Eq. (||) is satisfied, 
the null result at PD1 or PD2 excludes one of the possible 
signals ( \ipk±) with fc+ = M—m and k- = to). 



III. EXPERIMENT 



vectors. The additional port (at b in Fig. |f|) with an input 
of vacuum state |0) enlarges the space. 

The unitary operation of the Mach-Zehnder part (in- 
dicated as U in Fig. m can be written as 



^hM b +^II> + Bn\") b + B Y \l) b 
= U(A*\~) a + A v \ I )„ + B H |-) 6 + B V \ I ) b ) (11) 
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where 7/2 is twice the angle of HWP1. (This 7/2 rep- 
resents the angle of one of the unit basis in the enlarged 
space relative to the signal plane.) In our setup, the in- 
puts are Br = By = and hence B' v — 0. Thus the 
apparatus of Fig.^ actually couples a three-dimensional 
state space. 

PD0 detects |<->) h components whose amplitude is 
given by 

B'ft = -sin( 7 /2)^v (12) 

Its null result guarantees that the signal was not \ipo) . On 
the other hand, |<->) a and ||)„ components are further 



The principle described in the previous section is real- 
ized in an actual setup to confirm the theoretical results. 
In the experiment, the polarization basis { | •«->), 1 1 )} cor- 
respond to P- (within the paper plane in Fig. ^|) and S- 
(perpendicular to the paper plane) polarizations, respec- 
tively. 

The light source is a He-Ne laser (Spectra-Physics, 
model 117A) operating at the wavelength of 632.8 nm. 
The laser light of 1 mW is first attenuated by the attenua- 
tor ATN1 by a factor of 10 -6 , purified to the horizontally 
polarized state by the polarizing beam splitter PBS0. 
The half waveplate HWP0, driven by a stepping motor, 
works as a modulator to produce the set { Then 
the beam is further attenuated by ATN2 by a factor of 
1CP 4 . At the input of the Mach-Zchndcr interferometer, 
the light power is of order 10 -4 fW (« 3 • 10 5 photon/sec). 
In other words, the beam contains about 10~ 3 photons in 
one meter, whereas our detecting circuit is shorter than 
that. 

The polarization Mach-Zehnder interferometer is com- 
posed of two PBSs, PBS1 and PBS2. Each PBS is care- 
fully mounted so as to operate with an extinction ratio 
of 1 : 1000 (see below and Ref. f§). Each path of the 
Mach-Zehnder contains one half waveplate, HWP1 and 
HWP1'. The angle of HWP1 is adjusted to a quarter of 
7 in Eq. (0) so that the polarization of the light is ro- 
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mechanical 
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mode matching 
lenses 




PBS0 HWPO ATN2 
(motor-driven) 

FIG. 5: Experimental configuration. The same symbols as 
in Fig. ^ and ATN for an attenuator are used. Each of Ports 
0, 1, and 2 contains an APD and a silicon photodiode with 
a mechanical shutter to switch the beam between them. All 
PBS are adjusted for the maximum separation of two polariza- 
tion, resulting in a slightly (w 0.02 rad) slanted parallelogram 
arrangement for the Mach-Zehnder. 



tated by 7/2, whereas HWP1' is inserted for symmetry 
and thus adjusted not to affect the polarization state. 

The beams from the two paths are superimposed at 
PBS2, resulting in two output beams from the Mach- 
Zehnder. The one corresponds to path b in Fig.[| is de- 
tected directly at Port 0. The beam in path a in Fig.[I| is 
delivered to HWP2 at an angle of n/8 and then to PBS3, 
in order to visualize the interference of the beams from 
the two paths. The two outputs from PBS3 are detected 
at Ports 1 and 2. 

The relative path length of the Mach-Zehnder is ad- 
justed to be a proper operating point (which is the mini- 
mum at either of Port 1 or 2) by a PZT actuator through 
a feedback system utilizing the modulation-demodulation 
method. Once the relative path length is adjusted, a 
sample-and-hold circuit keeps the mirror position fixed 
during a measurement sequence (see below) which lasts 
typically 20-30 seconds. 

There are two photodetectors at each port, a silicon 
photodiode and an APD (avalanche photodiode, EG & 
G, SPCM-AQ-141-FC) guided through a multimode op- 
tical fiber. The former is for alignment purpose (with in- 
creased light) and the photon counting process is carried 
out with the latter, by mechanically switching the beam 
between them. The coupling efficiency of the fiber is 
measured to be 0.75-0.8, including the coupling lens and 
the connectors before the APD. The output from each 
APD is sent to a pulse counter (EG & G ORTEC, model 





FIG. 6: The relation between the measurement state vectors 
(left, quinary case M — 5 in this example) and the signal 
state vectors (right) with an initial offset angle 80 . 



995) to count the number of photon-induced pulses. 

The counters are activated simultaneously by a com- 
mon trigger, typically of one-second duration and five- 
time repetition. The numbers of counts in each duration 
are read by a computer from all counters, so that we can 
analyze statistical errors. This procedure is repeated for 
each signal \ipi) with i = 0, . . . , M— 1, composing a full 
sequence of measuring the mutual information. The ratio 
of counts in the three APDs provides the channel matrix 
P{yj\xi) from which the mutual information is derived. 

As is discussed in Section [n[ in the optimum detection 
scheme proposed, the mutual information is increased by 
excluding one of the possible signals. Thus, realizing zero 
probabilities at the output ports is essential in achieving a 
high mutual information. In practice, however, there are 
several causes that increase the probability at the output 
where ideally zero is expected. Among them, the most 
pronounced ones are the pulses from an APD without 
any light (APD error), the finite extinction ratio of a 
PBS (PBS error), and the finite contrast of interference 
(interferometer error) . 

Without any light at all, the average dark counts 
of the APDs were measured to be slightly less than 
lOOcount/sec. Although the whole interferometer is en- 
closed in a box, the environmental light increases the 
number of counts to around 300 count /sec, even if no 
laser light is injected. When the laser light is injected, 
the leak light due to the imperfection of the interferom- 
eter is added, and was measured to the average count of 
around lOOOcount/sec for the output port at which no 
count is expected ideally (see Tables |-[V). The last 
increment is considered as the contributions from the 
PBS errors and the interferometer error. At the ports 
for which finite counts are expected, we had the counts 
of order 10 5 count/sec at most, which is within the linear 
range of APDs. 

In general, a PBS has an angular-dependent separation 
of two polarization components. In our case, it turned 
out to be possible to achieve the separation better than 
1 : 1000 for both polarization components, by carefully 
aligning the angle of incidence slightly (as 0.02 rad) dif- 
ferent from the standard value tt/4. Then the expected 
contrast is w 0.998, which we thought sufficient for our 
experiment. We adopted this angle in our polarization 
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Mach-Zehnder interferometer, resulting in a parallelo- 
gram arrangement (see Fig.^). 

The actual contrast obtained with this interferometer 
can be as high as 



p . 



Prr 



0.98. 



(14) 



though the typical values under normal experimental 
conditions were slightly lower than this. Thus, this is 
limited not by the PBS imperfection but by, e.g., the 
spatial mode mismatch of the two beams. 

In order to analyze the performance of our detecter 
circuit, we measured not only the mutual information of 
the optimum detection scheme but also its dependence on 
the relative angle between the signal set { \ipi)} and the 
measurement state vectors { This is relevant to, 

for example, the possible rotation of polarization in the 
transmitting fiber. We measured the mutual information 
against the signal set { |^(#o))} where 



moo)) 



cos 



(i 



ITT 

M 
0,. 



+ si * 77+00 It) (15) 



,M-D 



as a function of the initial offset angle 9q (the optimum 
detection corresponds to 8 = 0). The relation between 
|^(#o)) an d \&j) is depicted in Fig.^j for the case of 
quinary signals. In the experiment, 9q was changed in 
steps of 7r/90 radian (two degrees). 



IV. RESULTS 

We carried out the optimum measurements described 
in Section [o| for the sources comprising the ternary 
(trine), quinary and septenary states. For the septenary 
signal states, both of the two optimum detection schemes 
(with m — 2 and m = 3 in Eq. (|9|) ) were tested. 

Fig. (7] shows the relative output counts at the three 
detectors as the polarization of the input light is varied 
in the ternary case. This relative power corresponds to 
the probability for the measurement outcome to occur 
for a single input photon. 

For the polarization angles {— 7r/6, 7r/6, 7r/2} we are 
performing the state discrimination with the minimum 
error probability, while for the angles tt/3, 0, 7r/3} 
we are realizing a measurement that allows unambigu- 
ous elimination of one possibility among the three let- 
ter states. These measurements were referred to as the 
trine and anti-trine measurements in Ref . p9[ . These au- 
thors found an rms deviation of 3.8 % from the theoret- 
ical value given in Table Q. Our results indicate a lower 
value of 1.1 %. The reason for our lower value is that we 
have been able to achieve a smaller PBS error. 

The data depicted in Fig.]?] leads to the mutual infor- 
mation presented in Fig. R. At the optimum operating 



1 

0.8 

o 

I 0.6 

Q. 
0) 

0.4 







Port 2 
Port 1 
PortO 


















/ 














/ 




























> 

r 













-71/2 -it/3 -u/6 lt/6 it/3 it/2 
polarization angle (radian) 

FIG. 7: The dependence of the relative outputs at the three 
APDs on the polarization angle of the injected beam in the 
ternary experiment. 
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FIG. 8: The dependence of the mutual information on 
the initial offset angle f9n in the ternary experiment ("exper- 
iment", pluses). The ideal case ("ideal", solid curve) and 
the ideal von Neumann case ( "Neumann" , dashed curve) are 
shown for comparison. The values in an earlier experiment 
j29| ("Clarke", triangles at 60 — and — 7r/6) are also shown. 



point, corresponding to the best detection strategy, we 
clearly find that the mutual information exceeds that at- 
tainable with the best von Neumann measurement. Our 
value also exceeds that obtained earlier by Clarke et 
al. j29[] represented as triangles in our figure. The reason 
for this is again the smaller PBS error. Our experimen- 
tal value is slightly lower than the theoretical maximum 
and this is due mainly to a residual PBS error of ap- 
proximately 0.1% and also to the imperfect contrast of 
interference. It was found |32| that despite the PBS error 
is not the limiting factor of the interference contrast, it 
has non-negligible effects on the mutual information. 

Fig.|| shows the relative output counts at our three 
detectors for the quinary case. These provide the data 
with which to calculate the mutual information depicted 
in Fig. [l^. Our data show a marginal increase in the mu- 
tual information beyond the value that may be attained 
with the best von Neumann measurement. The differ- 
ence between our experimental result and the theoretical 
value is again principally attributable to the PBS error 
and the imperfect contrast. 
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FIG. 9: The dependence of the relative outputs at the three 
APDs on the polarization angle of the injected beam in the 
quinary experiment. 
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FIG. 10: The dependence of the mutual information on the 
initial offset angle Go in the quinary experiment. The symbols 
are the same as in Fig. H. 



As mentioned earlier, the optimum detection scheme 
increases the amount of the mutual information by ex- 
cluding one of the possible signals. With three detectors, 
only three signals can be excluded at most, and the re- 
maining signals do not contribute the mutual information 
very much. This fact reduces the maximum mutual in- 
formation in quinary case (and in septenary case as well) 
from that in ternary case. Although the absolute dif- 
ference (of w 0.02) between the experimental and ideal 
values in the quinary case is similar to that in the ternary 
case, the excess from the von Neumann measurement be- 
came only marginal. 

Fig. [ll] shows the mutual informations derived with the 
two possible optimum detection schemes for the septe- 
nary case. Even in an ideal case, the increase in the 
attainable mutual information over that found using the 
best von Neumann measurement is quite small. In both 
cases our experimental values failed to reach even the 
value attainable by means of the best von Neumann mea- 
surement. 

The result with m = 3 shows a higher mutual infor- 
mation than that with m = 2. This difference is not by 
an experimental failure, but due to the difference in the 
influences of inperfect contrast between the two cases. 
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FIG. 11: The dependence of the mutual information on the 
initial offset angle f9n in the septenary experiment with m — 2 
(crosses) and m — 3 (pluses). Other symbols are the same as 
in Fig. | 



The reduced contrast increases the light leaking towards 
the port where ideally no light is expected, which in turn 
reduces the mutual information. The absolute amount 
of leak light is proportional to the amount of light in- 
terfering. This qualitatively explains the difference of 
experimental results with m — 2 and m = 3. In the for- 
mer case the interfering light is greater than the latter, 
thus the influence on the mutual information is larger. 



V. DISCUSSION AND CONCLUDING 
REMARKS 

Our ability to communicate classical information by 
means of a quantum channel is limited by the existence 
of non-orthogonal quantum states and the associated re- 
strictions in discriminating among them. These factors 
are fundamental to quanta as distinct from classical infor- 
mation theory and make quantum key distribution pos- 



sible 1 30 



The optimum use of a quantum communication chan- 
nel is closely related to the maximization of mutual infor- 
mation, as discussed in Appendix. The accessible infor- 
mation is obtained by maximizing the mutual informa- 
tion through the selection of the detection process. There 
are only a very few examples of signal states for which 
the accessible information is known ^2| ^3| [M| . One 
such example is that of the real symmetric qubit states 
II- 

In this paper we have described our polarization Mach- 
Zehnder interferometer that was designed to extract the 
accessible information from signals formed from symmet- 
ric polarization states. For the ternary (trine) states, 
our results proved an amount of information close (96 %) 
to the theoretical limit. Our value for the mutual in- 
formation exceeds that reported in an earlier experi- 
ment p9| . The difference between our measured value 
for the mutual information and the theoretical limit is 
due principally to the leakage of the 'wrong' polariza- 
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tion through our polarizing beam splitters and also to 
the imperfect contrast. The effect of this leakage is more 
pronounced when we consider the quinary and septenary 
signal states. Our experiments suggest that optimum 
quantum communication based on the ternary (trine) 
polarization states, for example the quantum key dis- 
tribution by the Phoenix-Barnett-Chefles protocol pTJ , 
should be feasible. Schemes based on the quinary and 
septenary states will present a greater challenge. 

In the light of fundamental interests, the quinary and 
septenary states meet with the simplest cases where the 
maximum amount of information can be extracted by 
a detection in which the number of possible outputs is 
less than that of input states. Davies' theorem predicted 
that a device with three possible outputs suffices for any 
real polarization system of a single photon. In our ex- 
periment, Davies' theorem has been tested within the 
PBS error. For the complete confirmation, further study 
might be necessary, e.g. comparing the minimum-output 
optimum detection with the one corresponding the group 
covariant optimal solution which consists of the same 
number of outputs as inputs. 
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APPENDIX A: MUTUAL INFORMATION 

In this Appendix we give the definition of the mutual 
information and explain its functional meaning. Primary 
concerns of information theory are how to represent mes- 
sages as effectively as possible and how to transmit mes- 
sages as precisely as possible. The mutual information is 
related with the second problem. 

A sender has a source of messages S and se- 
lects one of a known set {a, b, . . . , z} with given 
prior probabilities {P(a), P(b), . . . , P(z)}. This source 
may be characterized by the random variable S — 
{a, b,...,z; P(a), P(b), . . . , P(z)}. The sender represents 
each of these messages by a sequence of a given set of 
letters {xi} such as {0,1}. These are the symbols run- 
ning through the transmission channel. Each message is 
then represented by a codeword formed from a sequence 
of letters. This is source coding. Information theory tells 
us that the effectiveness of source coding can be mea- 
sured by the minimum of the average length required for 



a codeword and that it is given by the Shannon entropy 
H(S) = -J2P(A)log 2 P(A). (Al) 

A=a,b,... 

This is a measure of uncertainty in the random variable 
S. It takes its maximum value when all elements appear 
with equal probability, that is, when we know nothing 
better than a random guess for each element. This mea- 
sure of uncertainty is regarded as the amount of informa- 
tion required to represent S. 

A channel is usually subject to various types of noise 
disturbances. Information theory provides means and 
limits for reliable information transmission with such 
noisy channels. The key idea is to introduce some redun- 
dancy in the codeword representation prior to transmis- 
sion so as to allow the correction of errors at the receiving 
side. This entails adding some redundant letters to the 
codewords and hence increases their length. This is chan- 
nel coding. The mutual information quantifies how much 
redundancy is required for error-free transmission. 

The output from the source encoder is a sequence of 
the letters forming the codewords representing the mes- 
sages. For such sequences one can find the frequencies of 
appearance P{xi) for each letter Xi. Thus we can define 
a random variable X — {2^; P(xi)} for the outputs from 
the source encoder. This is the set of inputs to the chan- 
nel. A mathematical model for the channel is specified 
by the set of possible outputs {yj} and the conditional 
probability P(yj\xi) for each input. Given X, {yj}, and 
[P(yj\xi)], we can determine the existence or nonexis- 
tence of encoders and decoders that achieve a given level 
of transmission performance. 

The mutual information is defined between the input 
and output random variables X and Y = {yj\ P(yj)}. 
Here 



P( % )^^P( % |x,)P(x l ) 



(A2) 



is the probability of having yj. The uncertainty of the 
input random variable X is measured by the Shannon 
entropy 



H{X) = - Y,P{^ogP{xi) 



(A3) 



defined in a similar way to Eq. (Al). 

If the receiver detects the output signal yj , then he is 
now more certain about X. The new probability distri- 
bution conditioned by yj is given as 



P{ y] \x t )P{ Xl ) 
P(Vj) 



(A4) 



One can then define the average conditional entropy by 
H(X\Y) = -E P (%)E p ( x *\Vi) lo S p N%)- (A5) 
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This quantifies the remaining uncertainty of X after hav- 
ing the knowledge on the conditioning variable Y. The 
information extracted by the receiver is naturally defined 
by the reduction of the uncertainty, 



I(X : Y) = H{X) - H(X\Y) 
= ]T PO^P^-K) log 



p {Vj\xi) 



.(A6) 



This I(X : Y) is the mutual information between X 
and Y. 

Now let us consider a block coding of length n. The 
output from the source encoder is a letter sequence, 
which is devided into blocks (message blocks) of length k 
(<ri). Each block is supplemented by an additional block 
(correction block) of n—k letters to compose a transmis- 
sion codeword {x p } : 



message block correction block 



x p ix p 2 ■ ■ ■ x p k x p k +ix p k +2 ■ 
(for p= 1,2,. ..,L k ), 



(A7) 



where each x p i (I = 1,- ■ ■ ,n) is an element of possible 
letters {xf, i = 0, 1, • • • , L— 1}. Note that although there 
are L n possible sequences of length n in total, only part 
of them, i.e. L k sequences, are used as codewords. This 
redundancy, together with appropriate choice of correc- 
tion blocks, allows us to recover the possible errors in 
transmission. 

The input codeword x p will be disturbed in the chan- 
nel so as to come out as a different sequence y q = 
y q iV q 2 ' ' 'V q n- The channel decoder processes this out- 
put codeword to assign an appropriate sequence which 
should be the correct input codeword. The average error 
in this decoding should be as small as possible, while the 
redundancy n ~ k should also be as small as possible. In 
other words, keeping the ratio R = k/n, so-called the 
transmission rate, as large as possible, we wish to attain 
a small error in decoding. 

Let us suppose that encoding is made under the con- 
straint that the frequency of x^s occurring in the set of 
codewords {x p } is P(xi). Information theory says that 
by an appropriate design of the coding scheme it is pos- 
sible to transmit the messages with an error probability 
as small as desired if R < I(X : Y) is satisfied. For the 
fixed channel model [P(yj\xi)], one may further adjust 
prior probabilities {P(xi)} to maximize the mutual in- 
formation. The maximum value 



C c = max I(X : Y) 



(A8) 



is called the channel capacity. Then the channel coding 
theorem tells us |33| [5f| that if R < C holds there 
exists a coding scheme which transmits messages with an 
error probability as small as desired. Thus the mutual 
information is related to the ultimate use of the channel. 

The basic frameworks described above also apply to 
a quantum limited channel. However a new ingredient 



comes into play, which is a quantum effect in the detec- 
tion process. Let us consider the simplest case where the 
letter set {xi} is conveyed by a set of pure states { 
possibly a nonorthogonal set, through a noiseless channel. 
Then the channel model is specified by a POM {lb, } and 
the channel matrix P(y J \x i ) = (ipi\Uj The POM 

describes the measurement process and gives it a quan- 
tum prescription for generating the output letters {yj}- 
In the conventional (classical) context, the channel ma- 
trix [P(yj|xi)] is given and fixed. In quantum domain, 
however, one may ask what is the best possible POM 
for the given set of letter states { This is actually 

a nontrivial problem as discussed in Introduction. The 
problem can be decomposed into several steps. First we 
can consider the maximizationjof the mutual informa- 
tion with respect to a POM {H,} for the fixed { \ipi)} 
and prior probabilities {P(xi)}. The maximum value 

lAco({\i>i);P(x i )})=maxl({\il> i );P(xi)}:Y) (A9) 
{n 3 -} 

is called the accessible information of ;P(xi)}. We 
can then consider the maximization of the accessible in- 
formation over prior probabilities {P(xi)}, and may de- 
fine the quantity C\ as 



Ci = max I Ac 

{P(*i)} 



; ({|VO;P(*0}). 



(A10) 



This would be a natural extension from the conventional 
idea. However, this C\ is not in general the maximum 
bound for the transmission rate for error-free communi- 
cation, and hence it is not the channel capacity. In fact, 
there is the peculiar quantum interference effect in quan- 
tum detection of codeword states, which was not taken 
into account in the conventional theory. The true ca- 
pacity for a pure state channel is given by Hausladen et 
al. |36|. The general theory for a mixed state channel is 
given by Holevo j3?j] and by Schumacher and Westmore- 
land |§. 

To realize reliable transmission ensured by quantum 
theory of the channel capacity, one may need quantum 
computation for the decoding process p9| ]. This is, 
however, far beyond present technologies. If only a quan- 
tum detection on each letter state is available, then Tacc 
and C\ practically specify the limit of communication 
ability. Let us suppose again that encoding is made such 
that Xi (i.e. ji/'j)) occurs in the set of codewords {x p } (i.e. 
{ l^ii) ® 1 • • ® IVOI) w ith ^e probability P(xi). We fur- 
ther suppose that {H, } is the POM attaining the acces- 
sible information for X = {{ipi) ; P(xi)} and the receiver 
applies this detection separately on each letter states to 
get output sequences {yj 1 yj 2 ■ • • Uj n }- If R < Iacc holds, 
then a reliable transmission of the letters with an arbi- 
trarily small error is possible by an appropriate classical 
coding. The optimum POM for the accessible informa- 
tion is thus an important concern for devising a good 
code for a quantum limited channel. 
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